28 research outputs found
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions
Automatic music generation is an interdisciplinary research topic that
combines computational creativity and semantic analysis of music to create
automatic machine improvisations. An important property of such a system is
allowing the user to specify conditions and desired properties of the generated
music. In this paper we designed a model for composing melodies given a user
specified symbolic scenario combined with a previous music context. We add
manual labeled vectors denoting external music quality in terms of chord
function that provides a low dimensional representation of the harmonic tension
and resolution. Our model is capable of generating long melodies by regarding
8-beat note sequences as basic units, and shares consistent rhythm pattern
structure with another specific song. The model contains two stages and
requires separate training where the first stage adopts a Conditional
Variational Autoencoder (C-VAE) to build a bijection between note sequences and
their latent representations, and the second stage adopts long short-term
memory networks (LSTM) with structural conditions to continue writing future
melodies. We further exploit the disentanglement technique via C-VAE to allow
melody generation based on pitch contour information separately from
conditioning on rhythm patterns. Finally, we evaluate the proposed model using
quantitative analysis of rhythm and the subjective listening study. Results
show that the music generated by our model tends to have salient repetition
structures, rich motives, and stable rhythm patterns. The ability to generate
longer and more structural phrases from disentangled representations combined
with semantic scenario specification conditions shows a broad application of
our model.Comment: 9 pages, 12 figures, 4 tables. in 14th international conference on
semantic computing, ICSC 202
AccoMontage-3: Full-Band Accompaniment Arrangement via Sequential Style Transfer and Multi-Track Function Prior
We propose AccoMontage-3, a symbolic music automation system capable of
generating multi-track, full-band accompaniment based on the input of a lead
melody with chords (i.e., a lead sheet). The system contains three modular
components, each modelling a vital aspect of full-band composition. The first
component is a piano arranger that generates piano accompaniment for the lead
sheet by transferring texture styles to the chords using latent chord-texture
disentanglement and heuristic retrieval of texture donors. The second component
orchestrates the piano accompaniment score into full-band arrangement according
to the orchestration style encoded by individual track functions. The third
component, which connects the previous two, is a prior model characterizing the
global structure of orchestration style over the whole piece of music. From end
to end, the system learns to generate full-band accompaniment in a
self-supervised fashion, applying style transfer at two levels of polyphonic
composition: texture and orchestration. Experiments show that our system
outperforms the baselines significantly, and the modular design offers
effective controls in a musically meaningful way
Calliffusion: Chinese Calligraphy Generation and Style Transfer with Diffusion Modeling
In this paper, we propose Calliffusion, a system for generating high-quality
Chinese calligraphy using diffusion models. Our model architecture is based on
DDPM (Denoising Diffusion Probabilistic Models), and it is capable of
generating common characters in five different scripts and mimicking the styles
of famous calligraphers. Experiments demonstrate that our model can generate
calligraphy that is difficult to distinguish from real artworks and that our
controls for characters, scripts, and styles are effective. Moreover, we
demonstrate one-shot transfer learning, using LoRA (Low-Rank Adaptation) to
transfer Chinese calligraphy art styles to unseen characters and even
out-of-domain symbols such as English letters and digits.Comment: 5pages, International Conference on Computational Creativity, ICC
Motif-Centric Representation Learning for Symbolic Music
Music motif, as a conceptual building block of composition, is crucial for
music structure analysis and automatic composition. While human listeners can
identify motifs easily, existing computational models fall short in
representing motifs and their developments. The reason is that the nature of
motifs is implicit, and the diversity of motif variations extends beyond simple
repetitions and modulations. In this study, we aim to learn the implicit
relationship between motifs and their variations via representation learning,
using the Siamese network architecture and a pretraining and fine-tuning
pipeline. A regularization-based method, VICReg, is adopted for pretraining,
while contrastive learning is used for fine-tuning. Experimental results on a
retrieval-based task show that these two methods complement each other,
yielding an improvement of 12.6% in the area under the precision-recall curve.
Lastly, we visualize the acquired motif representations, offering an intuitive
comprehension of the overall structure of a music piece. As far as we know,
this work marks a noteworthy step forward in computational modeling of music
motifs. We believe that this work lays the foundations for future applications
of motifs in automatic music composition and music information retrieval